Multi - processor FFT

نویسندگان

  • Michel Jacquemin
  • Robert L. Krawitz
  • Lennart Johnsson
  • Robert L Krawitz
چکیده

Computing the Fast Fourier Transform on a distributed memory architecture by a direct pipelined radix-2 algorithm, a bi-section or multi-section algorithm, all yield the same communications requirement, if communication for all FFT stages can be performed concurrently, the input data is in normal order, and the data allocation consecutive. With a cyclic data allocation, or bit-reversed input data and a consecutive allocation, multi-sectioning o ers a reduced communications requirement by approximately a factor of two. For a consecutive data allocation, normal input order, a decimation-in-time FFT requires that P N + d 2 twiddle factors be stored for P elements distributed evenly over N processors, and the axis subject to transformation distributed over 2 d processors. No communication of twiddle factors is required. The same storage requirements hold for a decimation-in-frequency FFT, bit-reversed input order, and consecutive data allocation. The opposite combination of FFT type and data ordering requires a factor of log 2 N more storage for N processors. The peak performance for a Connection Machine system CM-200 implementation is 12.9 G ops/s in 32-bit precision, and 10.7 G ops/s in 64-bit precision for unordered transforms local to each processor. The corresponding execution rates for ordered transforms are 11.1 G ops/s and 8.5 G ops/s, respectively. For distributed oneand two-dimensional transforms the peak performance for unordered transforms exceeds 5 G ops/s in 32-bit precision, and 3 G ops/s in 64-bit precision. Three-dimensional transforms executes at a slightly lower rate. Distributed ordered transforms executes at a rate of about 1 2 to 2 3 of the unordered transforms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Implementation of 1-D and 2-D Mixed Architecture FFT Processor in Heterogeneous Multi-core SoC based on FPGA

A novel architecture FFT processor which can carry on 1-D FFT algorithm or 2-D FFT algorithm corresponding different size of FFT is proposed in this paper. The architecture is served as a scalable IP Core which is suitable for the heterogeneous multi-core SoC on chip application. The mixed architecture FFT processor achieves balance between high processing speed and resources. Compared with a c...

متن کامل

Variable-length FFT Processor Architecture

In this paper, we proposed an variable length FFT processor architecture which could be used in multi-mode and multi-standard OFDM communication system. Based on radix-2 single-path delay feedback architecture, FFT is implemented with the aid of LUT and control logic. The expected architecture would complete FFT ranging 2048/1024/512point. Keywords-FFT; Radix-2 FFT; variable length FFT;

متن کامل

Area-efficient FFT processor for MIMO-OFDM based SDR systems

In this letter, an area-efficient FFT processor is proposed for MIMO-OFDM based SDR systems. The proposed FFT processor can support variable lengths of 64, 128, 256, 512, 1024, 1536 and 2048. By reducing the required number of non-trivial multipliers with a mixed-radix algorithm, the complexity of the proposed FFT processor is dramatically decreased. The proposed FFT processor was designed in a...

متن کامل

A high-speed low-complexity modified radix-25 FFT processor for gigabit WPAN applications

Abstract— In this paper, we present a novel modified radix-2 algorithm for 512-point fast Fourier transform (FFT) computation and high-speed eight-parallel data-path architecture for multi-gigabit wireless personal area network (WPAN) systems. The proposed FFT processor can provide a high data throughput and low hardware complexity by using eight-parallel data-path and multi-path delay-feedback...

متن کامل

Design of 2w4wsk-point Fft Processor Based on Cordic Algorithm in Ofdm Receiver

In this paper, the architecture and the implementation of a 2K/4K/SK-point complex fast Fourier transform (FFT) processor for OFDM system are presented. The processor can perform 8K-point FFT every 273p , and 2K-point every 68.26~s at 30MHz which is enough for OFDM symbol rate. The architecture is based on the Cooley-Tukey algorithm for decomposing the long DFT into short length multi-dimension...

متن کامل

A Novel Architecture for Radix - 4 Pipelined FFT Processor using Vedic Mathematics Algorithm

The FFT processor is a critical block in all multi-carrier systems used primarily in the mobile environment. The portability requirement of these systems is mainly responsible for the need of low power FFT architectures. In this study, an efficient addressing scheme for radix-4 64 point FFT processor is presented. It avoids the modulo-r addition in the address generation; hence, the critical pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015